Skip to content

RTL8814AU: end-to-end TX via opt-in kernel-driver init replay#29

Merged
josephnef merged 1 commit into
masterfrom
feat/8814au-tx-replay-from-kernel-trace
May 22, 2026
Merged

RTL8814AU: end-to-end TX via opt-in kernel-driver init replay#29
josephnef merged 1 commit into
masterfrom
feat/8814au-tx-replay-from-kernel-trace

Conversation

@josephnef
Copy link
Copy Markdown
Collaborator

Summary

Two changes that together let an 8814AU chip actually transmit on-air under devourer's monitor-mode injection path:

1. TX descriptor byte-identical to kernel-driver

Verified by usbmon capture of an aircrack-ng/morrownr 8814au kernel-driver session injecting a probe-request frame on the same chip and channel, diffed against devourer's descriptor. Seven fields differed:

Field Was Now Rationale
MACID 0 1 broadcast/default CAM
RATE_ID (non-VHT) 7 8 rate-table index
GID 0 63 (0x3F) no-group default
SW_DEFINE 0 1 DriverFixedRate flag
RETRY_LIMIT_ENABLE 0 1 mgmt-frame default
DATA_RETRY_LIMIT 0 12 upstream rtl8814au_xmit.c:267
SPE_RPT 1 0 kernel does not set
DISABLE_FB 1 0 kernel does not set

Devourer's first TX bulk-OUT now reads 64002885 01120800 0000003f 00010000 00003200 00000000 01000000 76a90000 — byte-identical to the kernel-driver's TX descriptor.

2. Opt-in DEVOURER_OOT_REPLAY=1

Runs a verbatim replay of the kernel-driver's post-fwdl vendor-write sequence (4464 writes between the last fwdl bulk chunk and first TX bulk OUT, captured via usbmon) at end of init.

Devourer's HAL init even after PRs #25/#26/#27 leaves the chip in a state that diverges from the kernel-driver in many small ways which combine to wedge the chip's USB controller — bulk OUT EP 0x02 NAKs every TX URB. With the replay applied, devourer's chip-state matches the kernel byte-for-byte (verified via live pyusb register dump) and TX URBs drain.

Authoritative usbmon capture, 5-second steady-state TX window:

140-byte bulk OUT submitted:    566
completed status=0:             566
completed status<0:               0

(Repeatable across multiple runs.)

With replay disabled (default), bulk OUT continues to time out at the 500ms USB_TIMEOUT — unchanged behaviour vs prior master.

Why opt-in and not default-on

The replay's BB writes significantly slow the chip's RX throughput (RX-packet rate drops ~10× in a 60-second window). The trade-off is acceptable for TX-only workloads (injection-only monitor mode); RX-only users keep current behaviour by leaving the env var unset.

Long-term path

Replace the verbatim replay by porting the equivalent upstream init functions individually (rtl8814a_hal_init.c + usb_halinit.c) so TX works without the RX trade-off and without 130 KB of opaque trace data shipped in the binary. The verbatim replay is the minimum that actually unblocks TX today and serves as a regression checkpoint while the functions get ported.

How to use

# 8814AU TX from monitor mode:
sudo DEVOURER_PID=0x8813 DEVOURER_CHANNEL=6 DEVOURER_OOT_REPLAY=1 \
  ./build/WiFiDriverTxDemo

Verification done

  • Build green on macOS + Arch Linux 6.18
  • Default (no env var): 8814 RX unchanged from master (WiFiDriverDemo on 0bda:8813)
  • DEVOURER_OOT_REPLAY=1: bulk OUT URBs complete status=0 from the chip (usbmon-verified across multiple runs)
  • TX descriptor byte-identical to kernel-driver TX (usbmon-verified)
  • Live pyusb register dump confirms chip state matches kernel-driver byte-for-byte at all 23 addresses previously diverging

Not verified

On-air sniffer verification was not possible in the current lab setup — the aircrack-ng 88XXau OOT driver needed for the 8812 sniffer fails to build against kernel 6.18. The combined evidence (usbmon-verified URB completions + byte-identical chip-state + byte-identical descriptor as a known-working kernel-driver TX session) supports the end-to-end TX claim, but air-side verification on a receiving adapter is a follow-up.

🤖 Generated with Claude Code

Two changes that together let an 8814AU chip actually transmit on-air
under devourer's monitor-mode injection path:

1. TX descriptor now byte-identical to the kernel-driver descriptor.
   Verified by usbmon capture of an aircrack-ng/morrownr 8814au
   kernel-driver session injecting a probe-request frame on the same
   chip and channel, diffed against devourer's descriptor. Seven fields
   differed; values previously came from speculative WIP comments that
   didn't hold up empirically:

     MACID            = 1   (was 0)         broadcast/default CAM
     RATE_ID (non-VHT)= 8   (was 7)         rate-table index
     GID              = 63  (was 0)         no-group default
     SW_DEFINE        = 1   (was 0)         DriverFixedRate flag
     RETRY_LIMIT_ENABLE=1, DATA_RETRY_LIMIT=12  (were both 0)
     SPE_RPT          = 0   (was 1)         kernel does not set
     DISABLE_FB       = 0   (was 1)         kernel does not set

   Devourer's first TX bulk-OUT now reads:
     64002885 01120800 0000003f 00010000 00003200 00000000 \
     01000000 76a90000
   Byte-identical to the kernel-driver's TX descriptor.

2. Opt-in `DEVOURER_OOT_REPLAY=1` runs a verbatim replay of the
   kernel-driver's post-fwdl vendor-write sequence at end of init.

   Devourer's HAL init (even after PR #25/#26/#27) leaves the chip in
   a state that diverges from the working kernel-driver in many small
   ways which combine to wedge the chip's USB controller — bulk OUT
   EP 0x02 NAKs every TX URB. With the replay applied, devourer's
   chip-state matches the kernel byte-for-byte (verified via live
   pyusb register dump) and TX URBs drain. Authoritative usbmon
   capture during a 5-second steady-state TX window:

     140-byte bulk OUT submitted:    566
     completed status=0:             566
     completed status<0:               0

   With replay disabled (default), bulk OUT continues to time out at
   the 500ms USB_TIMEOUT (unchanged behaviour vs prior master).

   The replay table (hal/Hal8814_PostFwdlReplay.h, 4464 entries
   covering the writes between the last fwdl bulk chunk and the first
   TX bulk OUT in a kernel-driver usbmon capture) is opt-in because
   it also includes BB writes that significantly slow the chip's RX
   throughput (RX-packet rate drops ~10x in a 60-second window). The
   tradeoff is acceptable for TX-only workloads (e.g. injection-only
   monitor mode); RX-only users keep current behaviour by leaving the
   env var unset.

   Long-term, the replay should be replaced by porting the
   equivalent upstream init functions individually
   (rtl8814a_hal_init.c + usb_halinit.c) so TX works without the RX
   trade-off and without 130KB of opaque trace data shipped in the
   binary. The verbatim replay is the minimum that actually unblocks
   TX today and serves as a regression checkpoint.

Verified on CF-938AC (0bda:8813, channel 6):
  - Default (no env var): 8814 RX unchanged from master.
  - DEVOURER_OOT_REPLAY=1: bulk OUT URBs complete status=0 from
    the chip (usbmon-verified, repeatable across runs).
  - TX descriptor matches kernel-driver byte-for-byte.

Note: on-air sniffer verification was not possible in the current lab
setup (the aircrack-ng 88XXau OOT driver needed for the 8812 sniffer
fails to build against kernel 6.18). The combined evidence
(usbmon-verified URB completions + byte-identical chip-state +
byte-identical descriptor as a known-working kernel-driver TX
session) supports the end-to-end TX claim, but air-side verification
on the receiving adapter is a follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@josephnef josephnef merged commit 5de6d62 into master May 22, 2026
5 checks passed
@josephnef josephnef deleted the feat/8814au-tx-replay-from-kernel-trace branch May 22, 2026 15:32
josephnef added a commit that referenced this pull request May 23, 2026
## Status: WIP — chip init succeeds, RX data flow still silent

Brings up the RTL8821AU end-to-end except for RX bulk-IN data flow.
Posting so @RomanLut and others can continue the investigation from a
clean checkpoint that builds against current master instead of
resurrecting #22 (which deletes the 8814AU work and has a parallel
dispatch enum that doesn't compose with the `ICType`-based convention
#23-#29 established).

## What works on T2U Plus 2357:0120 (verified on macOS)

- Chip detection: `CHIP_8821_Normal_Chip_TSMC_D_CUT_1T1R`
- 8821 power-on flow (`rtl8821A_card_enable_flow`)
- Firmware download — 8821 blob, signature `0x2101`, FW ready in ~30ms
- MAC/BB/AGC/RadioA register tables via existing `PhyTableLoader` (same
phydm conditional encoding as 8814AU)
- RFE pinmux (`phy_SetRFEReg8821`), band switch 2.4G/5G, channel + TX
power table on ch6/36/100
- USB endpoint discovery: bulk IN 0x84, OUTs 0x05/0x06/0x08/0x09 (vs
8812/8814's 0x81 and 0x02-0x05)
- `libusb_clear_halt` on IN + `REG_USB_HRPWM=0x84` LPS wake

## What's NOT working

- **RX bulk-IN reads succeed at the USB layer but the chip never pushes
data.** 0 RX packets across 15s on ch100 even with the host Mac actively
associated to a busy 5GHz AP. The chip-internal RX-DMA → bulk-IN-EP
binding isn't engaging despite all known init steps.
- TX path is wired (correct OUT EP, no `LIBUSB_ERROR_NOT_FOUND`) but
unvalidated end-to-end on 8821AU — no peer sniffer in this session.

## Regression matrix (Linux trainer-arch, master vs
`feat/rtl8821au-support`)

| Adapter | Test | Result | vs master |
|---|---|---|---|
| 8812AU (`0bda:8812`) | RX | 41 pkts / 15s, 0 errors, `CHIP_8812`
detected | ✓ no regression |
| 8812AU | TX | 15 prints all `rc=1`, 0 failures (2 runs) | ✓ no
regression |
| 8814AU (`0bda:8813`) | RX | 0 pkts | ✓ matches master baseline
(pre-existing) |
| 8814AU | TX | `rc=1`, ~270-320 async failures (timing-variant) | ✓
matches master baseline |

The new `CHIP_8821` dispatch correctly routes 8812 → 8812 path and 8814
→ 8814 path. No misrouting.

## Suggested next steps for whoever picks this up

1. usbmon trace of `aircrack-ng/rtl8812au`'s RX bring-up against an
8821AU on Linux; diff post-fwdl register writes vs ours
(REG_TRXDMA_CTRL, REG_USB_AGG_TH/TO, REG_RXDMA_AGG_PG_TH,
REG_USB_SPECIAL_OPTION).
2. Compare register state post-init (kernel-driver readback vs our
post-init pyusb dump). Same technique that unblocked the 8814AU TX work.
3. Re-read @RomanLut's #22 `HalModule.cpp` for any 8821-specific init
steps I didn't carry over when rewiring through `ICType`.

## What this preserves

- 8812AU support — untouched (default dispatch branch)
- 8814AU support — untouched (#23-#29 work preserved)
- Same `HAL_IC_TYPE_E` / `ICType` dispatch pattern; no parallel enum

## Attribution

8821a HAL data (`Hal8821APwrSeq`, `Hal8821PhyReg`, `hal8821a_fw`) ported
verbatim from @RomanLut's #22 (svpcom/rtl8812au v5.2.20). Wiring re-done
to follow master's existing convention.

Refs #20 (BadPotato1007's underlying request) and #22 (RomanLut's
original port).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: RomanLut <noreply@github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
josephnef added a commit that referenced this pull request May 23, 2026
## What this is

A manual-run Python orchestrator that compares devourer's userspace
stack against the kernel driver (mainline `rtw88` / out-of-tree
`aircrack-ng/rtl8812au`) on a host with two plugged-in USB Wi-Fi
adapters. Emits a markdown table — designed to paste into PR review
comments.

```
                  TX = devourer        TX = kernel
RX = devourer     [end-to-end dvr]     [does dvr RX a kernel-TX frame?]
RX = kernel       [does dvr emit       [baseline / rig sanity]
                   valid frames?]
```

Each cell injects/receives the canonical beacon (SA `57:42:75:05:d6:00`,
matching `txdemo/main.cpp`) for `--duration` seconds and counts hits.

## Why now

PRs like #30 (RTL8821AU partial bring-up) need cross-driver validation:
\"does devourer's TX really emit valid frames?\" and \"can devourer RX a
frame the kernel driver knows works?\". Running these checks manually is
fiddly (modprobe / unbind / iw / tcpdump dance per cell); this script
does it in one command and prints a structured result.

This is **not** a 24x7 CI runner — too few PRs to justify the
infrastructure. It's a script the reviewer runs on demand on a test rig.

## Usage

```bash
cd /path/to/devourer && cmake --build build -j
sudo python3 tests/regress.py --channel 100
```

See [`tests/README.md`](tests/README.md) for full options + prereqs.

## First-run validation on trainer-arch

Arch Linux, kernel 6.x, USB hub with 0bda:8812 (8812AU) + 0bda:8813
(8814AU):

```
## Regression matrix — channel 100
- TX adapter: 0bda:8813 (RTL8814AU)
- RX adapter: 0bda:8812 (RTL8812AU)

|   | TX = devourer | TX = kernel |
|---|---|---|
| RX = devourer | 0 hits / 10 TX (437 fail) / 10s ✗ | 0 hits / 0 TX / 0s ✗ |
| RX = kernel | 1 hits / 10 TX (351 fail) / 10s ✓ | 0 hits / 0 TX / 0s ✗ |
```

The **devourer-TX(8814) → kernel-RX(8812) cell passed** — independent
confirmation that #29's 8814AU TX bring-up really does land frames on
the air. The remaining cells correctly identified the rig's known
limitations: mainline `rtw88_8814au` can't probe this 8814AU dongle on
this kernel (`failed to download firmware`, probe error -22), and 8814AU
RX is a pre-existing TODO.

## Portability

- Tool paths resolved via `which` (no `/usr/bin/X` hardcoding)
- Wlan iface names discovered via `iw dev` (works for systemd `wlp*` and
classic `wlan*`)
- Kernel driver claiming each DUT read from sysfs (no hardcoded module
names)
- Preflight check prints distro-agnostic install hints if anything's
missing
- Tested on Arch; should work on any modern Linux with `iw`, `tcpdump`,
`python3-scapy`, `aircrack-ng`

## VM-readiness

The kernel-cell shell-outs all go through one function
(`run_kernel_cmd`). Today: local exec. To migrate the kernel driver into
a pinned-kernel VM (recommended once host kernel upgrades start breaking
the out-of-tree aircrack-ng driver), wrap that function with `ssh
trainer-vm sudo` and arrange USB hot-plug passthrough via libvirt. The
matrix orchestrator doesn't need to change.

## Known limitations (documented in README)

- Tests \"signal of life\", not throughput — air noise makes absolute
counts unreliable; default pass-threshold is 1 hit with guidance to bump
for higher-confidence runs.
- Sequential matrix takes ~100s for 4 cells (devourer fwdl warmup + 4 ×
~25s).
- Two-adapter scope today. Extending to >2 is a pairing loop in
`main()`.
- One known bug: `<devourer-tx>TX #N` prints are rate-limited so when
the chip is failing every send, the parser undercounts attempts.
Mitigated by surfacing failure count separately in the output.

## Test plan

- [x] Builds + runs on trainer-arch (Arch + kernel 6.x)
- [x] Markdown table emitted correctly
- [x] At least one cell passes against real hardware (8814 dvr-TX → 8812
kernel-RX)
- [ ] Validate on a different distro (Ubuntu / Fedora) — anyone with a
2-adapter rig
- [ ] Validate against the out-of-tree `aircrack-ng/rtl8812au` driver
instead of mainline rtw88

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant